Tools for assessing health research partnership outcomes and impacts: a systematic review

Objective To identify and assess the globally available valid, reliable and acceptable tools for assessing health research partnership outcomes and impacts. Methods We searched Ovid MEDLINE, Embase, CINAHL Plus and PsycINFO from origin to 2 June 2021, without limits, using an a priori strategy and registered protocol. We screened citations independently and in duplicate, resolving discrepancies by consensus and retaining studies involving health research partnerships, the development, use and/or assessment of tools to evaluate partnership outcomes and impacts, and reporting empirical psychometric evidence. Study, tool, psychometric and pragmatic characteristics were abstracted using a hybrid approach, then synthesized using descriptive statistics and thematic analysis. Study quality was assessed using the quality of survey studies in psychology (Q-SSP) checklist. Results From 56 123 total citations, we screened 36 027 citations, assessed 2784 full-text papers, abstracted data from 48 studies and one companion report, and identified 58 tools. Most tools comprised surveys, questionnaires and scales. Studies used cross-sectional or mixed-method/embedded survey designs and employed quantitative and mixed methods. Both studies and tools were conceptually well grounded, focusing mainly on outcomes, then process, and less frequently on impact measurement. Multiple forms of empirical validity and reliability evidence was present for most tools; however, psychometric characteristics were inconsistently assessed and reported. We identified a subset of studies (22) and accompanying tools distinguished by their empirical psychometric, pragmatic and study quality characteristics. While our review demonstrated psychometric and pragmatic improvements over previous reviews, challenges related to health research partnership assessment and the nascency of partnership science persist. Conclusion This systematic review identified multiple tools demonstrating empirical psychometric evidence, pragmatic strength and moderate study quality. Increased attention to psychometric and pragmatic requirements in tool development, testing and reporting is key to advancing health research partnership assessment and partnership science. PROSPERO CRD42021137932 Supplementary Information The online version contains supplementary material available at 10.1186/s12961-022-00937-9.


Background
The emphasis on and number of studies involving health research partnerships has grown substantially over the last decade [1]. Despite this evolving popularity and mounting demand for the systematic quantification of partnership outcomes and impacts, the assessment of health research partnerships has not kept pace [2]. Here, we refer to health research partnerships as those involving "individuals, groups or organizations engaged in collaborative, health research activity involving at least one researcher (e.g. individual affiliated with an academic department, hospital or medical centre), and any partner actively engaged in any part of the research process (e.g. decision or policy maker, healthcare administrator or leader, community agency, charities, network, patients, industry partner, etc.)"(p 4) [3].
Although quantitative tools for assessing the outcomes and impacts of health research partnerships emerged in the late 1980s to early 1990s [5][6][7], available tools are largely simplistic and the assessment of outcomes and impacts in the health research partnerships domain, nascent [5,[7][8][9][10][11][12][13]. Available studies are often hampered by a lack of rigorous measurement, including tool psychometric testing to establish evidence of validity and reliability. The limitations of existing studies fall into three categories: many primary studies select single-use and locally relevant tools as a core part of the partnership process, with a focus on monitoring their partnerships' progress and on bespoke outcomes and impacts of highest relevance to them [5,9]. Although most tool studies aim to incorporate partner views, track individual partnership progression and capture partner perspectives, few aim to create more universally applicable, standardized tools that can be used more broadly or for replication studies [10]. Second, many such studies are limited by small sample sizes and lack of iterative tool testing, which in turn contributes to the lack of psychometric evidence and a lack of evidence across a broader range of contexts. Third, primary studies in this domain are often limited by interchanging terminology, a lack of discrete concept definitions, problems associated with literature indexing, location and retrieval [3,14,15], and multiple toolspecific challenges including construct identification, definition, refinement and application [5][6][7][8][9][10]12].
Cumulatively, these challenges inhibit the evolution of partnership assessment and ultimately slow the advancement of partnership science [9,10]. A recent overview of reviews examining quantitative measures to evaluate impact in research coproduction suggests that investigators must "engage more openly and critically with psychometric and pragmatic considerations when designing, implementing, [evaluating] and reporting on measurement tools" (p. 163) [8]. There is an established rationale for developing robust, pragmatic measures that are both relevant to partners and usable in real-world settings; pragmatic tools are viewed as a critical accompaniment to pragmatic designs [16][17][18]. In this light, health research partnership tools should be relevant to partners, be actionable, have a low completion burden, and demonstrate adequate validity and reliability. Importantly, there is a need for tools that are broadly applicable, can be used for benchmarks with accompanying norms to aid interpretation, and that demonstrate strong psychometric and theoretical underpinnings, without causing harm [16]. Closing these gaps would help to facilitate tool use, advance the measurement of systematic partnerships and drive improvements in partnership science [8].
Numerous tools for assessing health partnership outcomes and impacts have been identified in previous reviews focused on specific partnership domains, partner groups or contexts [5][6][7][8][9][10][11][12]; however, scope restrictions in these reviews preclude our understanding of tools across health research partnership traditions. These reviews also reveal that information about tool psychometric and pragmatic properties remains lacking. This study reviewed and systematically assessed globally available tools for the assessment of health research partnership outcomes and impacts to address documented gaps in both the psychometric and pragmatic characteristics of these assessment tools.
Our primary research question was as follows: what are the globally available, valid, reliable and acceptable tools for assessing the outcomes and impacts of health research partnerships? Our secondary research questions pertained to tool characteristics, including the

Search strategy and data sources
In consultation with an academic medical librarian (MVD), we iteratively developed a comprehensive search strategy using key papers and audit-improvement rounds to refine study catchment and feasibility [30]. The resulting health research partnership term clusters and the search strategy development methods have been applied to subsequent, parallel reviews [2,3,14,15,31]. We tested the strategy in Ovid MEDLINE to balance search sensitivity and scope [32]. The partnership search term cluster underwent peer review [33,34] by an academic librarian to test for conceptual clarity across multiple partnership approaches. The overall strategy was subjected to the Peer Review of Electronic Search Strategies (PRESS) checklist review by a second academic network librarian, resulting in the spelling correction of a single term. No restrictions for date, design, language or data type were applied. The search strategy was translated for all four databases (Additional file 1: Appendix S3).

Electronic databases
Using the a priori, unrestricted strategy, we searched MEDLINE (Ovid), Embase, CINAHL Plus and PsycINFO from inception through 2 June 2021, including two updates. The search generated a total of 56 123 citations, resulting in the screening of 36 027 de-duplicated records [35] and 2784 full-text papers, managed with EndNote ™ X7.8.

Eligibility and screening
We kept studies involving health research partnerships that (i) developed, used and/or assessed tools (or an element or property of a tool) to evaluate partnership outcomes or impacts [5,36] as an aim of the study, and (ii) that also reported empirical evidence of tool psychometrics (e.g. validity, reliability). We excluded studies in which the main purpose of the partnership was recruitment and retention of study participants. Conference abstracts were excluded from the eligible literature only after full-text assessment or confirmation that the citations were preliminary or duplicate records, or were lacking sufficient abstraction detail [37]. Abstracts in languages other than English were passed through title/ abstract (level 1 [L1]) screening but translated prior to full-text assessment ( Table 1).
All titles/abstracts (L1) and eligible full-text studies (L2) were screened and assessed independently, in duplicate (KJM with JB, LP, LN, SS, SM, MK, CM, AG,  LS, KA), and tracked in a Microsoft (MS) Excel [38] citation database and screening spreadsheets. We tested and revised screening tools at each stage of the review and employed a minimum calibration rule (Cohen's κ ≥ 0.60) [39] to align team members' shared understanding of concepts and the application of eligibility criteria [40][41][42][43]. To balance abstraction burden with data availability and complexity, full-text abstraction (study and tool characteristics) was undertaken using a hybrid strategy [22,44]. Eligible papers were independently abstracted by KJM and independently validated (MK, SS, SM, KP) [45] using a predefined coding manual. We resolved all discrepancies by consensus discussion [21,41]. Investigators were sought out to locate missing tools or for assistance in differentiating linked citations only [43]. At least two attempts were made to locate corresponding authors and tools when contact details or tools were incorrect or missing [3,5,14]. The assessment and abstraction/scoring of psychometric, pragmatic tool evaluation and study quality characteristics were also undertaken Page 4 of 30 Mrklas et al. Health Research Policy and Systems (2023) 21:3 independently and in duplicate, with discrepancies resolved the same way.

Study and tool characteristics
Data pertaining to study and tool characteristics were abstracted per the protocol [29]. We anticipated challenges associated with consistent use of terminology as are commonly reported in this research domain (e.g. outcomes/impacts, partnership approaches, tool type) [3,8,14,15]. When this occurred, we used the terms most prominent in methodological descriptions. We coded health subdomains inductively based on key words and study purposes [46]. More than one code per study was used to describe the study subdomain, as required.

Empirical evidence of tool psychometrics
The empirical psychometric evidence for tools was evaluated for each identified tool. Informed by previous studies [6][7][8][9][10][11][12] and best-practice recommendations [17,18,36,47,48], we created an initial list of psychometric evidence types, and expanded this list iteratively when new sources were identified by included studies (Additional file 1: Appendix S3). Only studies reporting empirical psychometric evidence were retained in this review to (i) address the documented lack of research reporting psychometric evidence for health research partnership outcomes and impacts assessment tools, and (ii) advance our understanding about the presence and types of psychometric evidence available in existing literature beyond simple dichotomous labels (e.g. valid/not valid or reliable/not reliable). By synthesizing the presence of psychometric evidence across studies, we also aimed to highlight areas in which the nature and type of psychometric evidence could be improved and advance the science of partnership assessment. This approach necessarily focused on later testing and evaluation stages of tool development [49] but does not diminish the importance of conceptual and theoretical sources of evidence to establish tool reliability and validity as important precursor evidence sources. As previously reported, the identification and reporting of psychometric data was complex and varied substantially in level of detail. This was mitigated through iterative review, piloting and calibration; all abstraction discrepancies were independently, then collectively considered, then resolved to consensus through recurrent discussion.

Pragmatic tool evaluation criteria
We modified a set of consensus-built criteria developed by Boivin et al. [7,50] as an alternative to applying the Psychometric and Pragmatic Evidence Rating Scale (PAPERS) criteria [17,18] due to the quality of reported data. The main purpose of the criteria checklist was to appraise the tools from the perspective of those intended to use the tools [7]. Team members iteratively modified and piloted the revised items. A final set of 20 criteria (five questions in four domains: Scientific Rigour, Partner Perspective, Comprehensiveness and Usability) were generated. Piloting confirmed that these criteria were a better fit for the level and detail present in the literature under examination, and provided a comprehensive, easily interpretable (single score) evaluation of scientific, partner, comprehensiveness and usability/accessibility properties for each tool (Additional file 1: Appendix S4). It is important to note that the original criteria were intended for use as a checklist, not a quality assessment [7]; we used them this way in our review. The modified criteria were applied independently and in duplicate to all tools [51], with discrepancies resolved by consensus. Tools were coded as toolkits in studies where multiple tools were described and intended for collective use; in these cases, tool characteristics were scored cumulatively and reported as a single tool.

Study quality assessment: the quality of survey studies in psychology (Q-SSP) checklist [52]
Study quality assessments typically assess the degree to which adequate measures were taken to minimize bias and avoid errors throughout the research process [53], and are hence design-focused. After piloting several quality appraisal tools with the eligible literature, we found that the best-fitting tool was an assessment of survey methods, namely the Q-SSP appraisal checklist and guide (Additional file 1: Appendix S5). The Q-SSP checklist was developed to address a wide variety of research and to help investigators differentiate broadly acceptable from lower-quality studies [52] using a four-stage process comprising evidence review, expert consensus, checklist refinement and criterion validity testing [52]. Q-SSP assessments were undertaken independently, in duplicate, and we resolved discrepancies by consensus.

Analysis
Basic descriptive statics including means, standard deviations and frequencies were calculated to synthesize quantitative study, tool, psychometric and pragmatic characteristics in MS Excel [38] and Stata v13.1 software [54]. The synthesized data were consolidated into tables. Scores for each of the pragmatic and tool evaluation criteria (mean/standard deviation) were synthesized and reported by criterion, domain and overall sample. We synthesized qualitative variables using thematic analysis [46] in NVivo v12.7 [55], in keeping with the overarching descriptive-analytical approach for the review [56], and used existing reporting guidelines to organize the findings [57][58][59]. Finally, study quality assessments (Q-SSP) [52] were documented by calculating an overall quality (%) and four domain-specific scores (ratios) for each study.

Study characteristics
Eligible studies comprised English-language and a single French-language report originating mostly in North America (39) and Europe (9), with a small remainder from South Africa (3), Australia (1) and Taiwan (1). Five dual-site studies involved the United Kingdom and South Africa (3), Canada and Australia (1), and Mexico and the United States (1) ( Table 2).
The eligible literature was widely dispersed, with exactly half of the publications (24, 50%) published in the same number of journals. Several small publication clusters were identified, including seven studies in Health Education & Behaviour (15%), three each in the American Journal of Community Psychology, Global Health Promotion and theses (each 6%), and two each in Health Promotion International, Public Health Nursing, Evaluation and Program Planning and Health Promotion Practice (each 4%). As shown in Fig. 2, about half of the identified literature was published after 2014 (20, 42%), and the earliest study was published in 1996.
The studies were conducted in multiple health subdomains (Fig. 3), including health promotion, prevention and public health (19), and disease-specific domains [i.e. cancer, mental health and substance use/harm reduction, and sexually transmitted/blood-borne infections and sexual health (12)]. The smaller subdomains included community health and development (7), special populations (e.g. primary care, paediatric/adolescent health, and immigrant and geriatric health) (6), partnerships (6), health equity (4) and health services research (3).

Tool characteristics
Included studies yielded 58 tools. The characteristics of the included tools are summarized in Table 3.

Pragmatic tool evaluation scores
Tables 4 and 5 present a synthesis of pragmatic tool evaluation criteria [7] (Additional file 1: Appendix S4).        Tool comprehensiveness was high in terms of documenting outcomes and/or impacts (100%), partnership process (95%) and context (97%); however, tools lacked deliberate design for recurrent monitoring of partnerships (33%).
In terms of Scientific Rigour, tools were not typically informed by systematic evidence (17%) but were conceptually grounded (90%) and presented evidence for both validity and reliability (90% and 93%, respectively, inclusive of both empirical and theoretical/conceptual sources). Only half of the tools were explicitly based on the experiences and expertise of partners (55%).
Overall, tool Usability was mixed. Tool purpose was always present (100%), but only half of the tools were freely accessible (50%), considered easy to read and Table 4 Pragmatic tool evaluation consolidated scores (n = 58 tools) (·) conceptual underpinnings not explicitly identified   understand (53%), accompanied by instructions (57%) and available in a readily usable format (62%). Tools were generally designed to be self-administered (97%), but not for reporting back to partners (28%). The level of partner involvement was not commonly included (28%), and partners were deliberately involved as codesigners in only 59% of studies, despite frequent capture of partner influence (76%).

Psychometric assessment
Psychometric testing and reporting were widely variable and challenging to assess, primarily due to inconsistent or incomplete testing, reporting and reporting detail. Almost three quarters of studies presented two or more forms of psychometric evidence for validity (35, 73%); eight studies (17%) presented two forms of evidence for reliability. Iterative assessment and abstraction of psychometric evidence revealed reliability evidence in four categories (internal consistency, test-retest reliability, inter-rater reliability and other). The most frequently occurring form of reliability evidence was internal consistency (83%). Validity evidence Fig. 4 Pragmatic tool assessment-criteria total scores (n = 58 tool scores) Table 6 Consolidated tool psychometric evidence (n = 58 tools) Each of the bolded lines denotes the overarching category of psychometric evidence described. Lines beneath each bolded category present the respective types of reliability, validity, norms and interpretability we identified from the selected literature

Psychometric criteria
Code frequency (n)
The problem and target population were generally well described and participant sampling and recruitment details present, but operational definitions (32, 67%), research questions and hypotheses (24, 50%) and sample size justification were often lacking (35, 75%). There were strong links between the proposed and presented analyses (46, 96%), but the study measures themselves were frequently missing from reports or supplements (17, 35%). The provision of validity evidence for included measures was found lacking in almost a third of studies (14, 29%), and most studies lacked detail about those collecting data (42, 88%), the duration of data collection (29, 60%) and the study context (25, 52%). Explicit reference to informed consent/assent and the inclusion of participants in post-data-collection debriefing was largely absent or unclear across included studies (29, 60% and 37, 77%, respectively).
Overall, four of the six studies with "acceptable" quality overlapped with studies reporting more comprehensive psychometrics [61,65,71,88], but only two overlapped with those reporting higher pragmatic tool criteria scores [61,71].

Discussion
This systematic review identified 58 tools for assessing health research partnership outcomes and impacts with tool psychometric evidence and pragmatic characteristics. We were able to identify a group of noteworthy tools, distinguished by their psychometric evidence, tool pragmatic characteristics and study quality scores.

Key study-level comparative findings
Overall, the presence and reporting of empirical psychometric evidence and pragmatic characteristics appeared improved in our study compared with previous reviews, yet several challenges related to the nascency of this research field remain (e.g. lack of key term definitions and measurement clarity, term switching, a lack of studies with deliberate focus on tool development, testing, evaluation and improvement, variable and inconsistent reporting). Future research to advance partnership measurement and science should consider both psychometric improvements (with specific emphasis on increased consistency, level of tested and reported detail, and dedicated study) and pragmatic considerations (specifically on accessible tools that are better informed by partner experiences and expertise, designed for partnership monitoring, and quantifiably readable). In examining tools with empirical psychometric evidence, this study contributes to our understanding of existing partnership tool measurement strengths and gaps. Our review provides practical ways to advance partnership measurement and, ultimately, partnership science.
At the study level, our findings aligned with previous reviews in that most included studies were North American-and English-centric, with a wide publication     dispersion pattern and mid-2010 emergence [2,7,8,11]. We also experienced previously reported challenges in the location of tools and author responsiveness [5,7]. Our study differed from others documenting a predominance of qualitative methods and relative rarity of quantitative tools, designs and methods [9,12,70,[90][91][92]. By contrast, our review deliberately sought and identified tools with empirical psychometric and pragmatic characteristics encompassing diverse health research approaches. This review identified studies employing cross-sectional and mixed-method/embedded survey designs and quantitative and mixed methods; this catchment is likely a function of our study inclusion criteria but may also reflect an increasing overall trend towards the quantification of partnership assessment [1, 7, 11-13, 92, 93].

Key tool-level comparative findings
On a tool level, we found similarities and differences between our study and previous, related reviews, but these studies differed in scope (e.g. literature, search period, research domains other than health, focus of measurement) and definitions of partnership, generating very different samples and eligible primary literature [2]. Our findings demonstrate the need for research deliberately focused on tool development, testing and evaluation. Like other related health research partnership reviews [7,8,10,94], we found that while tool purpose was universally reported, investigators focused almost exclusively on assessing and understanding the characteristics of bespoke partnerships. This was a consistent finding, despite the diverse scope and focus of these reviews (i.e. patient/public evaluation tools, community coalitions, coproduction impacts, and research collaboration quality and outcomes, respectively). Very few primary studies in our review focused specifically on tool validation or psychometric testing, although most involved one or more such activities. Furthermore, most studies were multifocal, that is, encompassing one or more tool development, modification, use, evaluation or validation activities simultaneously. These findings support previous reports regarding the paucity of focused health research partnership tool evaluation research [10,94]. Our findings strengthen existing recommendations targeting the systematic assessment of psychometric and pragmatic tool properties [8], and more deliberate funding of research on tool design, testing, improvement and evolvement in general [49]. These aspects are considered key to advancing partnership science measurement and partnership science as a field [8,9,70,95].
Conceptually, our study revealed a much higher presence of theoretical underpinnings at both the study and tool levels (91%, respectively), compared with levels reported in other partnership tool reviews of patient/ public and community coalition evaluation tools [7,94]. However, the implications of this finding remain unclear. Some authors have observed that theoretical/conceptual connections to both partnership and measurement theory rarely translate into operationalized tool elements [8,17]; this is an important area of future inquiry.
The tools we reviewed measured outcomes similarly, as compared with a recent review of patient/public partnership evaluation tools (52% vs 56%) [7]; however, in our study, we found that explicit definitions for outcome and impact terms were present intermittently and often interchanged. Terminology challenges have been reported in other systematic studies in the health research partnerships domain, noting the significant variance, overlap and omission of key term definitions from reports (i.e. terms for outcomes/impacts, partnership approaches and tool types) [9,14,15,96]. While comparative research and crosstalk among research partnership traditions is a relatively recent phenomenon [4,6,[96][97][98][99], clarity on key concepts, terminology, definitions, core measures and tools is fundamental to advancing partnership measurement and scientific inquiry [8,9,49,70].

Comparative findings: tool pragmatic characteristics, validity and reliability
Pragmatic tool evaluation scores were generally higher in our review than in Boivin and colleagues' review of patient partnership evaluation tools [7]. In our study, the highest mean domain scores were Comprehensiveness and Scientific Rigour, whereas Scientific Rigour was the lowest domain score in the Boivin review [7]). Importantly, we found that only a single tool overlapped between the reviews. This lack of overlap can be accounted for by differences in review scope, targets and inclusion criteria (i.e. the Boivin review focused on patient and public involvement evaluation tools and included tools for assessing engagement in both health system decision-making and health research, with narrower search terms over a shorter time span; and our review deliberately selected studies reporting empirical tool validity and reliability evidence). Tool validity (86%) and reliability (95%) evidence in our study was markedly higher and contrasted starkly with prior work [7,8], in which evidence for validity was found in only 48% and 7% of studies, respectively [7,8], and evidence for reliability was found in 45% and 35% of studies, respectively [7,8]. As noted previously, there was little to no overlap in captured tools between these reviews (n = 1 [7] and n = 13 [8], respectively), which can be similarly accounted for by differences in scope that generated different primary and secondary literature sets. The MacGregor overview of reviews [8] focused solely on reviews of tools to assess the impacts of research coproduction, differing by time span, key partnership terminology and key domains. As a result, only four of the eight identified reviews were considered in-scope; thus, the number of overlapping tools was limited (n = 13).

Future research
Boateng et al. [49] describe the requisite steps, activities and key precursors and concurrent factors required for robust tool development, testing and evaluation in the future. Specific attention to such steps and components could enable more deliberate tool evolvement in the health research partnership assessment domain. Specifically, the authors call for graduate-level training in the development and evaluation of tools, to create expertise in graduate students and research teams. Furthermore, the authors caution that this research can be "onerous, jargon-filled, unfamiliar, and resource intensive" (p. 1) [49]. Specific accommodations to offset resource and time intensity and higher participant burden due to larger sample sizes may be required. Health research partnerships assessments must meet the needs of both researchers and end-users by balancing rigour and resource intensity in a way that remains fit for purpose. Both deliberate funding and the use of hybrid study designs will be helpful for providing required focus and generating robust evidence that will address persistent psychometric and pragmatic gaps with future research.

Study limitations
We noted several key limitations with this review. We observed several challenges with respect to the evidence for and the testing of tool psychometric properties. Like Sandoval et al. [5], we experienced challenges related to the reporting of psychometrics on multiple levels (e.g. scale, index, subscale, item and tool), as well as mismatched use of psychometric evidence (e.g. justification or application of previous scale, subscale or item-specific psychometrics to other levels of testing). To mitigate this risk, we approached psychometric evidence in eligible studies with these issues in mind, and relied on strict methodological processes (independent, duplicate abstraction and review and resolution of all discrepancies through consensus discussions) to ensure accurate interpretation and representation of abstracted data.
As mentioned previously, the variable use of terminology may have compromised our ability to clearly describe and assess health research partnership tools. Further efforts to consolidate terms and definitions across health research partnership traditions will help resolve these issues in future work.
This study was limited in several ways by the accessibility and reporting concerns documented in previous reviews [3,5,7,14,15]. Most included studies were multimodal and did not often explicitly refer to tool development, testing or evaluation in their purpose statements. To mitigate the risk of missing potentially relevant studies in our review, we deliberately kept our inclusion criteria broad at the title and abstract (L1) screening phase. However, this strategy also produced a large set of L2 full-text assessments, negatively impacting study feasibility. Consensus and consolidation of evidence in this research domain, as well as more focused, explicit reporting of health research partnership assessment, tools and psychometric and pragmatic characteristics, will facilitate more efficient literature location, retrieval and assessment in the future. Finally, we noted a potential gap in the scope of a question modified as part of the pragmatic tool evaluation criteria: Was the tool informed by literature generated from a systematic literature search? In retrospect, we surmise that this question was too narrow to capture evidence derived from historical hypothesis testing generated by theoretically driven research (i.e. dimensionality tests) [49]. In addition to synthesis-level evidence for relevant components, tools or tool components that are informed by iterative tests of components derived from conceptual framework testing could play an equal or more important role in identifying and refining key tool constructs. Theoretically grounded components may also progressively improve the psychometric quality of health research partnership outcome and impact assessment tools. We recommend amending this question for use in future tool evaluation studies to better capture the full scope of relevant evidence underlying assessment tools.

Conclusions
This large-volume systematic review successfully identified empirically evidenced tools for the assessment of health research partnership outcomes and impacts. Our findings signal some promising improvements in the presence of conceptual, methodological and psychometric characteristics in measurement tools, and the availability of pragmatic tool characteristics. Persistent challenges linked to the nascency of the research partnership field and its measurement remain. Practically, the comprehensive tool characteristics presented here can help researchers and partners choose assessment tools that best fit their purposes and needs. Finally, our findings further strengthen calls for more deliberate and comprehensive tool development, testing, evaluation and reporting of psychometric and pragmatic characteristics to advance research partnership assessment and research partnership science domains.
Advancing knowledge of health research partnership outcomes and impacts assessment and partnership science are mandated aims of the IKTRN [100]. The IKTRN is a research network based at the Centre for Practice-Changing Research at the Ottawa Hospital and supported by the Canadian Institutes of Health Research. The IKTRN comprises researchers from more than 30 universities and research centres and research users from over 20 organizations, with a broad research agenda focused on best practices and their routine application to ensure effective, efficient and appropriate healthcare [101,102].
Additional file 1: Appendix S1. Systematic review protocol deviations and rationale. Appendix S2. Glossary of terms. Appendix S3. Translated search strategy. Appendix S4. Health research partnership pragmatic tool evaluation criteria. Appendix S5. Quality assessment checklist for survey studies in psychology (Q-SSP) criteria. Appendix S6. Bibliography of included studies. Appendix S7. PRISMA-systematic review checklist.